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Adaptive and interacting Markov chain Monte Carlo algorithms 
(MCMC) have been recently introduced in the literature. These novel 
simulation algorithms are designed to increase the simulation effi- 
ciency to sample complex distributions. Motivated by some recently 
V, ' , introduced algorithms (such as the adaptive Metropolis algorithm 

and the interacting tempering algorithm), we develop a general me- 
thodological and theoretical framework to establish both the conver- 
i-Q , gence of the marginal distribution and a strong law of large numbers, 

jrt ■ This framework weakens the conditions introduced in the pioneering 

paper by Roberts and Rosenthal [J. Appl. Probab. 44 (2007) 458-475]. 
It also covers the case when the target distribution ir is sampled by 
using Markov transition kernels with a stationary distribution that 
differs from n. 
>' 

~f*. • 1. Introduction. Markov chain Monte Carlo (MCMC) methods generate 

^^ | samples from an arbitrary distribution it known up to a scaling factor; see 

Robert and Casella (2004). The algorithm consists in sampling a Markov 

chain {X n , n > 0} on a general state space X with Markov transition kernel P 

admitting -k as its unique invariant distribution. 

In most implementations of MCMC algorithms, the transition kernel P 
of the Markov chain depends on a tuning parameter 9 defined on a space 
which can be either finite dimensional or infinite dimensional. 
/\ . Consider, for example, the Metropolis algorithm [Metropolis et al. (1953)]. 

3 \ Here X = M. d and the stationary distribution is assumed to have a density, 
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also denoted by ir with respect to a measure. At the iteration n, a move 
Z n+ \ = X n + U n+ \ is proposed, where U n+ \ is drawn independently from Xq, 
...,X n from a symmetric distribution on Mr. This move is accepted with 
probability a(X n ,Z n+ i), where a(x,y) = 1 A (7r(y)/ir(x)). A frequently ad- 
vocated choice of the increment distribution q is the multivariate normal 
with zero- mean and covariance matrix (2.38 2 /d)T+, where I\ is the covari- 
ance matrix of the target distribution it [see Gelman, Roberts and Gilks 
(1996)]. 

Of course I\ is unknown. In Haario, Saksman and Tamminen (1999), 
the authors have proposed an adaptive Metropolis (AM) algorithm in which 
the covariance T n is updated at each iteration using the past values of the 
simulations [see also Haario, Saksman and Tamminen (2001), Haario et al. 
(2004, 2006), Laine and Tamminen (2008) for applications]. 

The adaptive Metropolis is an example in which a parameter 6 n +i is 
updated at each iteration from the values of the chain {Xq, . . . , X n+ i} and 
the past values of the parameters {9q, . . . , 6 n }. Many other examples of such 
adaptive MCMC algorithms are presented in Andrieu and Thorns (2008), 
Rosenthal (2009) and Atchade et al. (2011). 

When attempting to simulate from a density with multiple modes, the 
Markov kernel might mix very slowly. A useful solution to that problem 
is to introduce a temperature parameter. This idea is exploited in paral- 
lel tempering: several Metropolis algorithms are run at different tempera- 
tures [see Geyer (1991), Atchade, Roberts and Rosenthal (2011)]. One of 
the simulations, corresponding to T\ = 1 is the desired target probability 
distribution. The other simulations correspond to the family of the target 
distribution ir 1 ' ', i € {1, . . . ,-ftT}, created by gradually increasing the tem- 
perature. 

The interacting tempering algorithm, a simplified form of the equi-energy 
sampler introduced Kou, Zhou and Wong (2006), exploits the parallel tem- 
pering idea. Both the algorithms run several chains in parallel, but the 
interacting tempering algorithm allows more general interactions between 
chains. The interacting tempering algorithm provides an example in which 
the process of interest interacts with the past samples of a family of auxil- 
iary processes. Other examples of such interacting schemes are presented in 
Andrieu et al. (2007) [see also Brockwell, Del Moral and Doucet (2010)]. 

The two examples discussed above can be put into a common unifying 
framework (see Section 2). The purpose of this work is to analyze these gen- 
eral classes of adaptive and interacting MCMC. This paper complements re- 
cent surveys on this topic by Andrieu and Thorns (2008), Rosenthal (2009) 
and Atchade et al. (2011) which are devoted to the design of these algo- 
rithms. We focus in this paper on two problems: the ergodicity of the sampler 
(under which condition the marginal distribution of the process converges 
to the target distribution n) and the strong law of large numbers (SLLN) 
for additive and unbounded functionals. 
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Ergodicity of the marginal distributions for adaptive MCMC has been 
studied by Andrieu and Moulines (2006) for a particular class of samplers in 
which the parameter is adapted using a stochastic approximation algorithm. 
These results have later been extended by Roberts and Rosenthal (2007) 
to handle more general adaptation strategies, but under conditions which 
are in some respects more stringent. Most of these works assume a form of 
geometric ergodicity; these conditions are relaxed in Atchade and Fort (2010) 
which addresses Markov chains with subgeometric rate of convergence. 

A strong law of large number for the adaptive Metropolis algorithm was 
established by Haario, Saksman and Tamminen (2001) (for bounded func- 
tions and a compact parameter space 0), using mixingales techniques; these 
results have later been extended by Atchade and Rosenthal (2005) to un- 
bounded functions and compact parameter space 0. The LLN for unbounded 
functions and noncompact set has been established recently in Saksman 
and Vihola (2010). Andrieu and Moulines (2006) have established the consis- 
tency and the asymptotic normality of re" 1 X^fc=i f(-^-k) f° r bounded and un- 
bounded functions for adaptive MCMC algorithms combined with a stochas- 
tic approximation procedure [see Atchade and Fort (2010) for extensions]. 
The procedure involves projections on a family of increasing compact sub- 
sets of the parameter space, and did not include the results obtained for the 
AM by Saksman and Vihola (2010). 

Roberts and Rosenthal (2007) prove a weak law of large numbers for 
bounded functions for general adaptive MCMC samplers but under technical 
conditions which are stringent. 

The analysis of interacting MCMC algorithms started more recently and 
the theory is still less developed. The original result in Kou, Zhou and Wong 
[(2006), Theorem 2], as already noted in the discussion paper [Atchade and 
Liu (2006), Section 3] and carefully explained in Andrieu et al. [(2008), 
Section 3.1] does not amount to a proof. Andrieu et al. (2008) presents 
a proof of convergence of a simple version of the interacting tempering sam- 
pler with K = 2 stages. The proofs in Andrieu et al. (2008) (uniformly ergodic 
case) and in Andrieu et al. (2011) (geometrically ergodic case) are based on 
the convergence of [/-statistics, which explains why the results obtained for 
K = 2 stages cannot easily be extended. 

SLLN was established by Atchade (2010) for a simple version of the in- 
teracting tempering algorithm for a transition kernel which is geometrically 
ergodic with uniformly controlled ergodicity constants, but the proof in this 
paper is not convincing [see Fort, Moulines and Priouret (2011), Section 1]. 

Finally, a functional Central Limit theorem was derived in Bercu, Del Mo- 
ral and Doucet (2009) for a class of interacting Markov chains for uniformly 
ergodic Markov kernels. 

This paper aims at providing a theory weakening some of the limitations 
mentioned above. Let {Pq, € 0} be a family of transition kernels on X. We 
address the general framework when the target density n is approximated by 



4 G. FORT, E. MOULINES AND P. PRIOURET 

the process {X n , n > 0} such that the conditional distribution of X n+ i given 
the past is given by Pg n (X n , •); {9 n ,n > 0} is the adapted process. There are 
two main contributions. First, we cover the case when the ergodicity of the 
transition kernels {Pg,6 £ 0} is not uniform along the path {9 n , n > 0}. The 
second novelty is that we address the case when the Pg has an invariant 
distribution irg depending upon the parameter 6; in this case, the adapta- 
tion has to be such that {irg n , n > 0} converges weakly to tt (almost surely) 
and we provide sufficient conditions for this property to hold based on the 
(almost sure) weak convergence of the transition kernels {Pg n ,n > 0}. Such 
conditions are crucial in many applications where irg is known to exist but 
has no explicit expression. Therefore, to generalize the results and include 
more realistic conditions, a more complex approach is required. 

The paper is organized as follows. In Section 2, we establish the con- 
vergence of the marginal distribution and the strong law of large numbers 
for additive functionals for adaptive and interacting MCMC algorithms. 
These general results are applied to a running example, namely the adap- 
tive Metropolis algorithm. The novel contribution is the application to the 
convergence of the interacting tempering algorithm [Kou, Zhou and Wong 
(2006)] in Section 3. 

Notation. Let (X, X) be a general state space [see, e.g., Meyn and Tweedie 
(2009), Chapter 3] and P be a Markov transition kernel. P acts on bounded 
functions / on X and on c-finite positive measures \x on X via 

Pf(x) = J P(x, dy)f(y), iiP{A) d ^ f J fx(dx)P(x, A). 

For n S N, we will denote by P n the n-iterated transition kernel defined by 
induction 

P n (x,A) d ^ J P n - 1 (x,dy)P(y,A) = J P(x,dy)P n - 1 (y,A) 

with the convention that P° is the identity kernel. For a function V : X — » 
[1, +oo), define the y-norm of a function / :X — > R by 

def |/(X)| 

V = sup-——. 

X6X V{X) 

When V = 1, the F-norm is the supremum norm and will be denoted 
by H/lloo- Let Ly be the set of functions such that ||/||y < +oo. For two 
probability distributions /^i,/i2 on X, define the ^-distance 

||Mi-M2||v= sup |mi(/)-M/)I- 

{/,ll/llv<l} 
When V = 1, the ^-distance is the total variation distance and is denoted 
by ||^i -^ 2 ||tv- 

Denote by Q(X) the class of bounded continuous functions from X to M.. 
Recall that a Markov transition kernel P on (X, X) is (weak) Feller if it 
maps Q(X) to Cfe(X). 
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A measurable set A € A on a probability space (Q,A,P) is said to be 
aP-fullset if P(A) = 1. 

2. Main results. Let (0,T) be a measurable space and (X, X) a general 
state space. Let {Pg,9 € 0} be a collection of Markov transition kernels 
indexed by 9 in 0, which can be either finite or infinite dimensional. We 
consider a X x 0-valued process {(X n ,9 n ),n > 0} on a filtered probability 
space (f2, A, {F n ,n > 0},P). It is assumed that (X n ,6 n ) is J^-adapted and 
for any bounded measurable function / 

(1) E[f(X n+1 )\F n ] = P e J(X n ). 

2.1. Ergodicity. For V : X -> [1, oo) and (9, 0' G 0, denote by D v (9, 9') the 
^-variation of the kernels Pg and Pg/ 

(2) Dv(^0=su P l|Pe(x '- ) - P f (x '- )l|y . 

When V = 1, we use the simpler notation D(9,9'). Consider the following 
assumption: 

Al For any 9 € 0, there exists a probability distribution irg such that 

ngPg = 7T0- 
A2 (a) For any e > 0, there exists a nondecreasing sequence {r e (n),n > 0} 
in N \ {0}, such that lim sup,,^.^ r £ (n)/n = and 

limsupE[||P£W (A n _ re(n) , •) - vr 0n _ r£(n) || TV ] < e. 

(b) For any e > 0, lirrin^oo Ej=o K [ D (8n-r e (n)+j,Q n -r e (n))] = 0, where 
D is defined in (2). 

Assumption A2(a) is implied by the containment condition introduced in 
Roberts and Rosenthal (2007): for any e > 0, the sequence {M £ (X n , 6 n ),n > 0} 
is bounded in probability, where for x £ X, 9 £ 0, 

(3) M £ (x,8) = inf{n > 0, ||P e n (x, •) - 7r fl || T v < e}- 

In this case, it is easily checked that A2(a) is satisfied by setting r e (n) = N 
for all n > 0, where iV is large enough. Assumption A2(a) is weaker than 
the containment condition, because the sequence {r e (n),n > 0} can grow to 
infinity. This is important in applications where it is not known a priori that 
the parameter sequence {9 n ,n > 0} stays in a region where the ergodicity 
constants are controlled uniformly. Examples of such applications are given 
in a toy example and a more realistic example below. 

Assumption A2(b) requires that the amount of change vanishes as n goes 
to infinity at a rate which is matched with the rate at which the kernel 
converges to stationarity. If the kernel mixes uniformly fast along any pa- 
rameter sequence {9 n ,n > 0}, that is, r e (n) = N for any n > for some 
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integer 7V, A2(b) is equivalent to the diminishing adaptation condition in- 
troduced in Roberts and Rosenthal (2007): {D(9 n ,9 n ^i),n> 1} converges 
to zero in probability at any rate. On the other hand, if the ergodicity is 
not uniform along a sequence {9 n ,n > 0}, then the rate of convergence of 
the adaptation should converge to zero but with a fast enough rate. As ex- 
pected, there is a trade-off between the rate of convergence of the chain and 
the rate at which the parameter can be adapted. This does not necessarily 
imply however that the parameter sequence {9 n ,n > 0} converges to some 
fixed value [see, e.g., Roberts and Rosenthal (2007)]. 

Theorem 2.1. Assume Al and A2. Let f be a bounded function such 
that lim n 7T0 n (/) = a F-a.s. for some constant a. Then 

lim E[f(X n )] = a. 

n— >oo 

The proof is in Section 4.1. As a trivial corollary, we have: 

Corollary 2.2. Assume Al and A2. Assume {vre n ,ri>0} converges 
weakly to ir P-o.s. Then, lim n _ >0O E[/(X n )] = ir(f) for any bounded contin- 
uous function f . 

When -Kg = ir for any 9 £ 0, Theorem 2.1 improves the results of Roberts 
and Rosenthal (2007) by weakening the conditions on the transition ker- 
nels {P$,9 € 0} (the containment condition is not assumed to hold). The 
following example shows that ergodicity can be achieved even if the con- 
tainment condition in Roberts and Rosenthal (2007) fails, provided that the 
adaptation rate is slow enough. 

Example 1 (Toy example). Let us consider the following example intro- 
duced in Andrieu and Moulines (2006) and thoroughly analyzed in Andrieu 
and Thorns [(2008), Section 2] and Bai, Roberts and Rosenthal (2011). Let 
{9 n ,n > 0} be a [0, l]-valued deterministic sequence. Consider the nonhomo- 
geneous Markov chain over X = {0, 1} with transition matrix 



(4) Pe 



9 1 
1-9 



#€[0,1]. 



For any 9 £ [0, 1], ir = [1/2,1/2] is a stationary distribution; the chain is 
irreducible if 9 G (0, 1). In this case, for e > and 9 £ (0, 1), 

M £ (x,9) = ln(e)/ln|l-26»|. 

Assume that, for n > 1, 9 n = n" 1 ' 4 . Clearly, for any e > 0, {M £ (X n , 9 n ),n > 
0} grows to infinity with probability 1 and the containment condition does 
not hold [see also Bai, Roberts and Rosenthal (2011), Proposition 1]. 
Setting r(n) = n 1 ' 3 

lim sup Ell Pl (n) (X n _ r(n) , •) - vr|| TV = limsup|20 n - l| r(n) = 

"n— r(n) ^ ' 



n-^oo v J n^-oo 
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shows that A2(a) holds. Furthermore, we have 

D(9,9') = sup ||P e (av)-iMav)l|TV = 2|0-0'|. 

26(0,1} 

Therefore, with 9 n = n~ l / A , D(9 n ,9 n -i) = 0{n~ l ), and A2(b) is satisfied 
with r(n) = n 1 ' 3 . Corollary 2.2 therefore applies, and the marginal distribu- 
tion converges. 

To check A2(a), it is often easier to use drift conditions. To simplify the 
discussion below, this paper only covers the case of drift inequalities for 
geometric ergodicity. Extensions to subgeometric rates of convergence can 
be obtained following the same lines [see, e.g., Bai, Roberts and Rosenthal 
(2011) and Atchade and Fort (2010)] and are left to future work. In the geo- 
metric setting, one commonly assumes the following simultaneous geometric 
drift and minorization conditions: 

A3 For all 9 € 0, Pg is 7r-irreducible, aperiodic and there exist a function 
V :X— s> [1, +oo), and for any 9 € there exist some constants bg < oo, 
6g £ (0, 1), Xg € (0, 1) and a probability measure vq on X such that 

PeV<XeV + b e , 

Pe(x, •) > 5 e ve{-)^{v<c e }{x), c e = 2b e (l - Xg)" 1 - 1. 

A3 implies geometric ergodicity [see, e.g., Meyn and Tweedie (2009), Chap- 
ter 15]. The following proposition can be obtained from Roberts and Rosen- 
thal (2004), Fort and Moulines (2003), Douc, Moulines and Rosenthal [(2004), 
Proposition 3] or Baxendale (2005) [see also the proof of Lemma 3 in Saks- 
man and Vihola (2010) for a similar result]. 

Lemma 2.3. Assume A3. Then for any 9, there exists a probability dis- 
tribution TTg such that TTgPg = TTg, 1Tg(V) < bg(l — Xg)" 1 and 

\\P e n (x,-)-Trg\\ V <Cg P g l V(x) 

for some finite constants Cg and pg € (0, 1). Furthermore, there exist positive 
constants C and 7 such that for any 9 £ 0, 

(5) L e ^ C e V (1 - pg)" 1 < C{bg V 5 e l V (1 - A^)- 1 }^. 

Example 2 [The adaptive Metropolis (AM) algorithm]. We establish 
the ergodicity of the AM algorithm. In this example, X = M. d and the densi- 
ties are assumed to be w.r.t. the Lebesgue measure. For x € M rf , |x| denotes 
the Euclidean norm. For k > 0, let C d be the set of symmetric and positive 
definite d x d matrices whose minimal eigenvalue is larger than k. The pa- 
rameter set = M. d x C% is endowed with the norm \9\ 2 = |^| 2 + Tr(r T r), 
where 9 = (p,,T). 
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At each iteration, X n+ \ ~ Pg n (X n , •), where Pg is defined by 



(6) 



Pe(x,A) d M J(lA^\q r (y-x)dy 



+ ± A (x) 



i-f(™$)v<»-*)* 



with q-p the density of a Gaussian random variable with zero mean and co- 
variance matrix (2.38) 2 d~ 1 T. The parameter 6 n = (p n ,T n ) G is the sample 
mean and covariance matrix 

(7) Hn+l = lin H — 7 (X n+ i - Hn), Mo = 0, 

n + 1 

(8) r n+ i = — — -T n H — -{(X n+1 - p, n )(X n+1 - p n ) T + Kid}, 

n+1 n + 1 

where 1^ is the identity matrix, To > and k is a positive constant. 

By construction, for any 9 G Q, ir is the stationary distribution for Pg so 
that Al holds with ttq = n for any 6. As in Saksman and Vihola (2010), we 
consider the following assumption: 

Ml 7r is positive, bounded, differentiable and 

x 
lim sup - — j— • Vlog7r(x) = — oo 

r ^°°|x|>r \W 

for some p > 1. Moreover, tt has regular contours, that is, for some 
.R>0, 

x Vir(x) 
sup 1 — r • — ; ;. < 0. 
\x\>r\ x \ |V7r(x)| 

Saksman and Vihola [(2010), Proposition 15] establishes a drift inequality 
and a minorization condition on the kernel as in A3, with a drift function 
V oc 7r~ s with s = 1/2. Nevertheless, the generalization to an arbitrary s G 
(0, 1) is straightforward. Note that the function 

(9) W(x) d ^7r- s (x)\\Tr s \\ 00 

grows faster than an exponential under Ml [see, e.g., Saksman and Vihola 
(2010), Lemma 8]. Hence, Lemma 2.3 and Proposition 15 of Saksman and 
Vihola (2010) both imply: 

Lemma 2.4. Assume Ml. For any a G (0, 1] and 6 G Q, there exist C a fi < 
oo and p a $ G (0, 1), such that 

\\P e k (x, •) - 7r\\ W a < C afiP k afi W a (x) for any x G R d , 
where W is defined by (9). In addition, there exist finite constants c a ,b a such 
that 

C a ,eV (I- PafiY 1 <C a \9\ d ^ 2 + b a , 

where the constant 7 is defined in Lemma 2.3. 
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In Saksman and Vihola [(2010), Lemma 12] it is proved that under Ml, 
the rate of growth of the parameters {0 n ,n > 0} is controlled. Namely, for 
any r > 0, 

(10) supn~ T |0 n | < +co, P-a.s. 

n>l 

In the following lemma, we establish a control of the rate of growth of the 
state of the chain {X n , n > 0} . 

Lemma 2.5. Assume Ml. Then: 

(i) E[W(X n )]<E[W(X )]+nb. 

(ii) For any t > and any r > 0, there exists a constant Ct )T such that 
for any n > 0, 

nW(X n )l BXip ^ n _ lk -r\o kl < t ] <nW(X )j + C t , T n Td ^, 

where 7 is defined in Lemma 2.3. 

(iii) IfE[W{X )] < +00, for any r > 0, sup n > x n'^WiX^ < +00, P-a.s. 

The proof of this lemma is given in Section 4.2. By combining Lemma 2.4 
and Lemma 2.5, we prove A2(a): as a consequence of Lemma 2.4, it holds 
for any r > such that r > rdj/2 and for any t > 

(11) limsup sup sup ||P e (x, •) — 7r|| TV = 0, 

ra->oo e&B,\e\<tn T x£R d ,W(x)<tn 1 + T 

where |_'J denotes the lower integer part. For t > 0, set 

O t d ={w:sup?i~ T |0 n |<t,supn- 1 - r I^(X n )<i). 
*- n>l n>\ > 

Equation (10) and Lemma 2.5(iii) show that lini(_ >00 P(r2() = 1. Set r{n) = 
\n r J . The Fatou lemma and the monotone convergence theorem show that 

limsupE[||P^ (X n _ r(n) ,-)-vr|| TV ] 



<E 



limsup||P; ( ^ (X n _ r(n) ,-) 



< lim E 

t— >-oo 



limsup||P e r(n) (X n _ r r n ),-) -7r|| TV ln t 



-'n — r(n) 



0. 



Therefore, A2(a) is satisfied whereas clearly the uniform containment con- 
dition [see (3)] seems to be very challenging to check. 

Consider now A2(b). It is proved in Andrieu and Moulines [(2006), Lem- 
ma 13] that for any (0,0) E 9 2 and a £ [0, 1], D W a(6,6) < 2dn~ 1 \T - f|. By 
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definition of T n [see (8)], we have for any m <n, 

2dK~ l ( m n ~ m_1 

D W a(0 n ,0 n _ m )< \2Kmd-\ } \X j+1 

n \ n — m ^-^ 

n-l 

j=n—m 

By definition of the empirical mean \i k [see (7)] there exists a constant C" 
such that |/i fc | <C"{fc _1 X;|=il^| 2 } 1/2 ;underMl ) liminf| a; |^. 0O lnW(x)/|ic| > 
[see the proof of Lemma 8 in Saksman and Vihola (2010)]. Therefore, there 
exists a constant C such that 

Dwa(9 n ,9 n - m ) 

(12) <C^(l+ (1 + ln(re - m)) Vln 2 ^(X J ) 

n n-m ^^ J 

+ a±^o) f. ln , w{Xj) \ 

j=n—m ) 

The proof of A2(b) now relies on the control of moments for the r.v. 
{In W(Xj), j > 0}. Lemma 2.5(i) and Jensen's inequality show that the 
moment E[ln W(X n )] increases at most as In n. Then there exists a con- 
stant C such that for any m<n and for any a € [0, 1] , 

In 3 (n) 
E[D w «(9 n ,9 n _ m )} < Cm—±-±E[W(X )]. 

Then, for any r G (0, 1/2), lim n ^ +00 ^}=o E[^(#n-KJ+j A^-j)] = 
and A2(b) holds. Combining the results above yields: 

Theorem 2.6. Assume Ml and E[W(X )] < +oo. Then, for any boun- 
ded function f , lim n _ >00 E[/(A n )] =vr(/). 

2.2. Strong law of large numbers for additive functionals. In this section, 
a strong law of large numbers (SLLN) is established. The main result of this 
section is Theorem 2.7 which provides a SLLN for a special class of additive 
functionals. To that goal, A3 is assumed to hold (which implies Al, see 
Lemma 2.3), and it is required to strengthen the diminishing adaptation 
and the stability conditions. 

A4 IXi k-HL Bk y L dk _ 1 fD v {9 k ,9 k . l )V{X k ) < +oo P-a.s., where By and 
Lq are defined in (2) and (5). 

A5 (a) limsup n 7Te ri (y) < +oo, P-a.s. 

(b) For some a > 1, J2T=o( k + l )~ aL T k P e k va ( X k) < +°°, P-a.s. 
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Here again, these conditions balance the rate at which the transition ker- 
nel Pg converges to stationarity and the adaptation speed. This is reflected 
in the condition A4: [Lg h V Lg k _ 1 ) is related to the rate of convergence of the 
kernels Pg k and Pe k _ 1 to stationarity and Dy{9k,9k-i) reflects the adapta- 
tion speed. 

Theorem 2.7. Assume A3, A4 and A5. Let F:Xx6^1 be a mea- 
surable function such that: 

(i) sup e \\F(-,9)\\v<+oo, 

(ii) EELi fc" 1 ^ ll^(- A) " H; 0fc-i) ||vV(-Xr fc ) < +00 P-a.s., 
(hi) limn^oo j Tr 9n (dx)F(x,9 n ) exists P-o.s. 

Then, 

1 n ~ 1 /" 

lim -Vf(4« t )= lim 7T0 n (dx)F(x,e n ), P-a.s. 

fc=0 J 

The proof is in Section 4.3. When the function F does not depend upon 9, 
this theorem becomes the following. 

Corollary 2.8. Assume A3, A4 and A5. Let /:X — >R be a measur- 
able function such that \\f\\v < +00 and lim n _ > . 00 7re n (/) exists P-a.s. Then, 
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Efc=0 f( X k) ^ lim n 7T0„ (/) • 



Example 3 (Toy example: law of large numbers). For 9 £ (0, 1), the con- 
stants Cg and pg (see Lemma 2.3) are, respectively, equal to 1 and |1 — 29\ 
and V = 1. This implies that L e = 1/(20) if 6» < 1/2 and 1/(2(1 - (9)) oth- 
erwise. Therefore A3 is satisfied since EfcLi^ -1 ^" l^fc-i ~~ ^fcl < +°° wnen 
0fc = A: -1 ' 4 . Assumption A4(a) is automatically satisfied because the sta- 
tionary distribution does not depend on 9. Assumption A4(b) is satisfied for 
any a > 4/3 because in such case Ea^LiX^ -1 ^)" < °°- By Theorem 2.7, the 
SLLN is satisfied for this nonhomogeneous Markov chain. 

The stated assumptions are very general and, when applied to some spe- 
cific settings, can be simplified. For example, in many interesting examples 
(see, e.g., Section 3), it is known that lim sup n _ >00 Lg n < 00, P-a.s. and for 
some a > 1, sup n>0 E[V a (X n )] < 00. Under these assumptions, it is straight- 
forward to establish the following corollary: 

Corollary 2.9. Assume A3 and: 

(i) limsup n ^ 00 Lg n < 00 and limsup n _ >oc TTg n (V) < +00, P-a.s., 
(ii) there exists a > 1 such that sup k>Q E[V a (X k )] < +00, 
R ET=i k ~ lD v(0k,0 k -i)V(X k ) < +00 P-a.s. 
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Let f :X — y M be a measurable function such that \\f\\v < +°° an d 
lmi n ^ oo TT0 n (f) existsF-a.s. Then, n~ l Y^k=o f( x k) ^> lim n _ >oo 7r 0n (/). 

Example 4 (AM: law of large numbers). Application of the above cri- 
teria yields the SLLN for the AM algorithm. This result has recently been 
obtained by Saksman and Vihola (2010). 

Let a G (0, 1) and set W(x) = tf ^(cc) ||^" s II oo for s € (0, 1). We prove that 
a (strong) LLN holds for any function / in Lw a - We choose r > small 
enough so that 

(13) (l-a)>T(a + 3d<y), l/a-l>rdr/(l/a + l/2), 

where 7 is given by Lemma 2.3. Consider A4. By Lemma 2.4 and (10), there 
exists a r.v. U u P-a.s. finite such that L dk V Lg k _ 1 < Uik Td ^/ 2 . By (12) and 
Lemma 2.5(iii), there exists a r.v. U2, P-a.s. finite such that -Dw (#&,#&- 1) < 
Uih~ 1 ln k. Finally, applying Lemma 2.5(iii) again, there exists a r.v. Us, 
P-a.s. finite such that W a (X k ) < U^k a ^ 1+T ' . Combining these inequalities 
show that there exists a r.v. U , P-a.s. finite such that 

^k-\L 6h v Le k _^D W a(e k ,e k ^)w a (x k ) <u^2k 2 - a -^ a+3d ^ hi 3 k, 

k k 

thus showing A4 [observe that the RHS is finite by definition of r, equa- 
tion (13)]. The proof of A5(b) could rely on the same inequalities in the 
case a € (0,1/2). Nevertheless, a SLLN can be established for larger val- 
ues of a by using the bound on W(X n ) given by Lemma 2.5(h) which im- 
proves on Lemma 2.5(iii). Set f^ = {sup n>1 n~ T \6 n \ <t}. By Lemma 2.5, 
lim^ +00 P(Q t )tl and A5(b) holds provided Ek^'^L^Pe^WiXk)!^ 
is finite P-a.s. for any t > 0. Lemmas 2.4 and 2.5(h) imply that there exists 
a constant Ct such that 



E 



£ k-V'LV'Pe^WiXJltu < Q £>-i/a+T<*ra/a+i/2). 

The RHS is finite by definition of r [see (13)]. 

The above discussion is summarized in the following theorem. 

Theorem 2.10. Assume Ml and E[W(X )] < +00. Then, for any a G 
(0, 1) and any function f £ Cw a , n ~ l X)fc=i f( x k) ~~^ ff(f) ■ 

2.3. Almost sure convergence of the invariant distributions. When the 
stationary distribution ttq is not explicitly known, convergence of the se- 
quence {^e n > n > 0} has to be obtained from the convergence of the transition 
kernels {Pg n ,n > 0}. We propose below a set of sufficient conditions allow- 
ing to prove the almost sure convergence of {irg n (f),n > 0} for continuous 
functions /. The proof of Theorem 2.11 is in Section 4.4. 
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Theorem 2.11. Assume that X is a Polish space. Assume A3 and: 

(i) limsup^^^L^ < oo f-a.s. where Lg is given by (5), 
(ii) for any function f in Cfe(X) ; the class of functions {Pgf,9 G 0} is 
equicontinuous, 

(iii) there exists 9* G and for any x G X, a W-full set £l x such that for 
any to G Q X) {Pg n / UJ -\(x,-),n > 0} converges weakly to Pg it (x,-). 

Then, there exists a W-full set Qq such that, for any any w G ^o and f G 
Q>( x )> 7T0 n (w)(/) ■^ 7r e*(/) ^r, equivalent^, for any u G Vt Q , ^g n {^) con- 
verges weakly to irg^). 

Note that the weak convergence implies that for any u G f^o an d for 
any set A such that Trg ic (dA) = where dA denotes the boundary of A, 
lim n 7rg n{Lu) (A)=Tr 0ir (A). 

Theorem 2.11 might be seen as an extension of the classical results on 
the continuity of the perturbations of the spectrum and eigenprojections; 
but it is stated under assumptions that are weaker than what is usually 
assumed [Kato (1980), Theorem 3.16]. The difference stems from the fact 
that condition (iii) does not imply the convergence of Pg to Pg^ in operator 
norm. This is crucial to deal with the interacting tempering algorithm (see 
Section 3). 

Condition (iii) of Theorem 2.11 is certainly the most difficult to check. 
In the case, it is known that for any function / € Q(X), there exists a P- 
full set Q x j such that for any w G &x,fi nm n Pe n (cu)f{ x ) = Pe^fix), then the 
existence of a P-full set, uniform in / for / G Q(X), relies on the characteri- 
zation of the weak convergence by a separable class of functions [see Dudley 
(2002), Theorem 11.4.1, and Proposition 3.3 below for an example]. 

3. Convergence of the interacting tempering (IT) algorithm. We con- 
sider the interacting tempering algorithm, which is a simplified form of the 
equi-energy sampler by Kou, Zhou and Wong (2006). 

Assume that X is a Polish space equipped with its Borel a-field X ' . Let n be 
the target density w.r.t. a measure // on (X, X). Denote by K the number of 
different temperature levels, T\ = 1 < T2 < • • ■ < Tk- For k G {1, . . . , K — 1}, 
let P^ ' be a transition kernel on (X, X ) with unique invariant distribu- 
tion TT 1 ' Tk . Fix v G (0, 1) the probability of interaction. 

We denote by X™ = (Xn ) n the sampled values at each temperatures Tk- 
The chains are defined by induction on k: given the past of the process X^ +l > 
up to time n, and the current value Xn of the current process X^ >, we 
define X n ^ as follows: 

1. with probability (1 — v), the state X^K is sampled using the Markov 
kernel P( k \X n k \-), 
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2. with probability v, a tentative state Z n+ \ is drawn at random from 
the past {X« , £ < n}. This move is accepted with probability 1 A 

We consider first the case K = 2. We will then address the general case 
(see Theorem 3.6 below). For notational simplicity, we set T<i = T > 1 and 
P^ 1 ' = P. Denote by G the set of the probability measures on (X,X). For 

any distribution ^£0, define the transition kernel Pe(x, •) = (1 — v)P(x, •) + 
vKg(x, •), where, for any A £ X, 

(14) K e (x, A) d M J a(x, y)6(dy) + l A {x) J {I - a(x, y)}e(dy) 

with 

15 a(x,y) = lA f\ \> =1A-^1 ?1--G 0,1 

7r(x)7r(y) i /-' ttp^x) 1 

Denote by {Y n ,n > 0} the process run at the temperature T. It is not as- 
sumed that {Y n ,n > 0} is a Markov chain. We simply assume that, for any 
bounded continuous function /, n _1 X^fc=i f(Yk) ~ * &*(f) a - s - where 6+ is 
the probability distribution on (X, X) with density (w.r.t. fi) proportional 
to tt 1 ' t . We consider the process {X n ,n > 0} defined, for each n > and 
any bounded function / : X — > R, 

E[f(X n+1 )\T n ] = P 9 J(X n ) where 9 n (f) = (n + l)" 1 ^ f(Y k ). 

k=0 

Since, by construction, ttPq^ =tt, it is expected that the marginal distribu- 
tion of Xk as k goes to infinity converges to ir. To go further, some additional 
assumptions are required: 

11 7r is a continuous positive density on X and ||7r||oo < +oo. 

12 (a) P is a 7r-irreducible aperiodic Feller transition kernel on (X, X) such 

that ttP = ir. 

(b) There exist r G (0, 1/T), A G (0, 1) and b < +oo such that 

(16) PW<XW + b withW(x) < ^(7r(x)/||7r|| 0O )~ r . 

(c) For any p € (0, ||7r||oo), the sets {ir > p} are 1-small (w.r.t. the tran- 
sition kernel P). 

When X C R" and P is a symmetric random- walk Metropolis (SRWM) algo- 
rithm then ttP = tt and P is 7r-irreducible [Mengersen and Tweedie (1996), 
Lemma 1.1]. If in addition the proposal density is continuous on X then, 
since n is positive and continuous on X, any compact set of X is 1-small 
[Mengersen and Tweedie (1996), Lemma 1.2]. Therefore, the transition ker- 
nel of a SRWM algorithm satisfies 12(a) and 12(c). 
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Drift conditions of the form 12(b) for the SRWM algorithm on X C ~R d 
are discussed in Roberts and Tweedie (1996), Jarner and Hansen (2000) 
and Saksman and Vihola (2010). Under conditions which imply that the 
target density n is super-exponential and have regular contours (see Ml), 
Jarner and Hansen (2000) and Saksman and Vihola (2010) show that any 
functions proportional to tt~ s with s € (0, 1) satisfies a Foster-Lyapunov 
drift inequality [Jarner and Hansen (2000), Theorems 4.1 and 4.3]. Under 
this condition, 12(b) is satisfied with any r in the interval (0, 1/T). 

Stability conditions on the auxiliary process {Y n ,n > 0} are also required. 

13 (a) 0*(W) < +oo and for any continuous function / in Cw, @n(f) —~> 

(b) su Pn E[TU(Y n )]<+oc. 

The following proposition is the key-ingredient to prove the convergence of 
the IT sampler. Under the stated assumptions, we prove that the transition 
kernels {Pq,0 £ 0} satisfy a Foster-Lyapunov drift inequality and a mi- 
norization condition. The proof of Proposition 3.1 is adapted from Atchade 
[(2010), Lemma 4.1]; a detailed proof is given in Fort, Moulines and Priouret 
(2011), Section 2. 

Proposition 3.1. Assume II and 12. Then, there exist A € (0, 1), b < oo, 
such that, for any 6 £&, 

(17) P e W{x)<XW{x) + b9(W). 

In addition, for any p € (0, ||7r||oo), the level sets {n > p} are 1- small w.r.t. 
the transition kernels Pg and the minorization constant does not depend 
upon 6. 

Corollary 3.2. Assume II, 12, 13 and E[W(X )] < +oo. Then: 

(i) su Pn > E[tU(X n )]<+oo, 
(ii) lim sup n _ Sh00 Lg n < +oo P-o.s., where Lq is defined by (5). 

The proof of Corollary 3.2 is in Section 5.1. As a consequence of Propo- 
sition 3.1, the transition kernel Pq possesses an (unique) invariant distri- 
bution -Kg. Ergodicity and SLLN for additive functionals both require the 
a.s. convergence of TTe„(f) (see Theorems 2.1 and 2.7). Nevertheless, in this 
example, ttq does not have an explicit expression. The proof of the following 
proposition is postponed in Section 5.2. 

Proposition 3.3. Assume II, 12, 13 and E[W(X )] < +00. Then, the 
conditions of Theorem 2.11 hold and for any bounded continuous function f , 
lim n7 r 0n (/)=7r(/)P-a. S . 

We now address the convergence of the marginals. 
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Theorem 3.4. Assume II, 12, 13 and E[W(X )] < +oo. Then, for any 
bounded continuous function f, lim n E[/ (X n )] =7r(/). 

Proof. We check the assumptions of Corollary 2.2. By Corollary 3.2(i), 
{W(X n ),n>0} is bounded in probability. Furthermore, Corollary 3.2(h) 
implies that limsup n Ce n < +oo P-a.s. and limsup n p6i n < 1 P-a.s. This pro- 
ves A2(a). 

The next step is to establish A2(b). Since, for any bounded function /, 
n+m (/) = (n + m + l)- 1 Efc;r+i/(n) + (n+l)(n + m+l)- 1 n (/),wehave 

211 f II m 
\Pe n+ J{x) - PeJ{x)\ < sup \f(y) - f(z)\\\0 n+m - 8 n \\ TV < g l|o ° 

y,zex n + m + L 

Consequently, D(6 n+m ,9 n ) is deterministically bounded by a sequence con- 
verging to zero. We have 

r s (n)-l 2 , . 

22 HD{On-Mn)+j,0n-r E (n))] < 2 - J ^^ 

thus proving A2(b) with any sequence of the form r £ (n) = n r with r < 1/2. 
Finally, Proposition 3.3 proves the convergence of ^e n {f) for any bounded 
continuous function /. □ 

We now state the strong law of large numbers for the IT sampler. 

Theorem 3.5. Assume II, 12, 13 and E[VF(X )] < +oo. Then: 

(i) for any measurable set A such that J dA n d/i = where dA is the 
boundary of A, 

(ii) for any a € (0, 1) and any continuous function f in L\ya, 

-J2f(X k )^ //TTd/i. 

k=0 

Proof. We check conditions (i), (ii) and (hi) of Corollary 2.9 with V = 

W a for a G (0,1), and a = 1/a. Assumption A3 holds and limsup n Le n < 
+oo P-a.s. [see Proposition 3.1 and Corollary 3.2(h)]. The drift condition (17) 
implies that 

(18) limsup7r 9n (Tv")< ^limsup 6 n (W). 

n 1 — An 



CONVERGENCE OF ADAPTIVE AND INTERACTING MCMC 17 

Since W is continuous, the assumption 13(a) implies that lirasup n 6 n (W) < 
oo P-a.s. Hence, condition (i) of Corollary 2.9 holds. Corollary 3.2(i) implies 
the condition (ii) of Corollary 2.9. The definition (2) of Dy implies 

rv(0*.0*-i) < MOk - e k -i\\ v < -j^k-i{v) + -^jV(Y k ). 

Hence, under 13(a), condition (iii) of Corollary 2.9 holds if J2k k~ 2 V(Xk) < 
+oo and ^2 k k~ 2 V(X k )V(Y k ) < +oo P-a.s. The first series converges since, 
by Corollary 3.2(i), sup fc E[y(A"fc)] < +oo. For the second series, it is suf- 
ficient to prove that J2k k ~ 2/vvl/p ( x k)V 1/p {Yk) < +oo w.p.l with p d = 
(2a) V 1. We have by the Cauchy-Schwarz inequality 

E[v 1/p (Y k )v 1/p (x k )} < E[y 2 / p (n)] 1 / 2 E[y 2 / p (x fc )] 1 / 2 

< E[V 1/a (Y k )] 1 / 2 E[V 1/a {X k )] 1 / 2 

= E[W(Y k )] 1 / 2 E[W(X k )] 1 / 2 . 

The RHS is finite under 13(b) and Corollary 3.2(i). Then, this concludes the 
proof of condition (iii) of Corollary 2.9. 

It remains to prove that lim„7re n (/) = vr(/) P-a.s. By Proposition 3.3, this 
property holds for any bounded continuous function / and any set A such 
that f dA Trdn = 0. We proved that there exists a > 1 such that 
limsup n 7T0 n (y a ) -\-7r(V a ) < +oo [see (18)]. Classical truncation arguments 
imply that lim n -Kg n (/) exists P-a.s. for any continuous function / G Cy [see, 
e.g., Billingsley (1999), Theorem 3.5, or similar arguments in the proof of 
Proposition 4.3]. □ 

To summarize the above discussions, the process {X n ,n > 0} has uni- 
formly bounded VF-moments (see Corollary 3.2), the distribution of X n con- 
verges to 7r as n — >• +oo (Theorem 3.4) and a strong law of large numbers 
is satisfied for a wide family of functions (Theorem 3.5). The results are 
obtained provided the auxiliary process also possesses uniformly bounded 
W- moments and satisfies a strong law of large numbers (see 13). Repeated 
applications of this result provides sufficient conditions for the interacting 
tempering with multiple stages to be ergodic and to satisfy a strong law 
of large numbers. Recall that IT algorithm defines recursively K random 
sequences X^> = {X % n ,n > 0} for i G {1, . . . ,K} such that X® targets the 
distribution proportional to 7T 1 ' *. We are interested in X^ 1 ' which targets 
■k 1 ' Ti = ir. The proof of Theorem 3.6 is in Section 5.3. 

Theorem 3.6. Let (X, X) be a Polish space, and n be a density (w.r.t. 
a measure \i) satisfying II. Choose T+ > 1 and T± = 1 < T% < ■ ■ ■ < Tr < T*. 
Assume that for any i G {1,... ,K — 1}, there exists a ir -irreducible Feller 
transition kernel P^ 1 ' on (X, X) such that: 

(i) 7r l/T i p(i) =7r l/T i) 
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(ii) for any s € (0, 1/Tj), there exist \( l > € (0, 1) and &W < +oo such that 
P®U a < \®U, + &W where U s oc tt~ s ._ 
Assume in addition that there exists T€ (Tk,T+) such that: 

(iii) /7T 1 / TA "~ 1 / T d^<+oo, 

(iv) for any continuous function in C _i/t> 

(v) sup n E[ 7 r- 1 /'r(X^ ) )]<oo. 

Finally, assume that for any i € {!,..., K — 1}, K[ir ' (Xq )] < +oo. 
Then, for any continuous function f in jC^-i/t* , 



i£f{xP)^ff«d». 



n 

k=l 

Note that since convergence holds for any continuous function / in C^-i/t* , 
it also holds with f = 1a where A is a measurable set such that J 9A it d/i = 0. 

We conclude this section by an example of SRWM-based interacting tem- 
pering algorithm, for which the conditions of Theorem 3.6 hold. The proof 
is in Section 5.4. 

Proposition 3.7. Let tt be a super- exponential density onX = M. d with 
regular contours (i.e., satisfying Ml). Let T* € (l,+oo) and choose a tem- 
perature ladder 1 = T\ < ■ ■ • < T% < T*. Consider the K -stages interacting 
tempering algorithm with: 

• for i € {1, . . . ,K — 1}, PW i s a SRWM transition kernel with invariant 
distribution proportional to tt 1 ' * and proposal distribution A/"d(0, £^), 

• {Xn ,n > 0} is a SRWM Markov chain with invariant distribution pro- 
portional to tt 1 ' K and proposal distribution Nd(0,^ K ')- 

Finally, assume that for any i € {1, . . . ,K}, K[tt > {Xq )] < +oo. Then, for 
any continuous function f G jC^-i/t*, n~ l Ylk=if(-^k ) ~~^ 7r (/) as™—* +oo. 

4. Proofs of Section 2. 

4.1. Proof of Theorem 2.1. We preface the proof by a lemma, which is 
proved in Atchade et al. (2011), Proposition 1.7.1. 

Lemma 4.1. For any integers n,N > 0, 

N-l 
sup \E[f(X n+N )\F n ] - P e N J(X n )\ < V E[D{6 n+j ,9 n )\F n ], F-a.s. 

ll/l|oc<l 1~i 
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Proof of Theorem 2.1. Let / be a bounded nonnegative function. 
Without loss of generality, assume that ||/||oo < 1- For any N <n, 



\nf(x n )} - a \< \nf(x n ) - p e N n _ N f(x n -N)}\ + \n^ N (f) - «n 

+ \nP e N n _ N f(X n . N )-n en _ N (f)]\- 

Let e > 0. By setting N = r £ (n) where the sequence {r e {n),n > 0} is as 
in A2(a), the third term on the RHS in (19) is bounded by 

-r e (n) 



^KZJ X n-Mn)r)-7re n _ rei J\ TV ] 



Under A2(a), for any large n this expectation is upper bounded by e. 
Lemma 4.1 shows that 



r e (n)— 1 



|E[/(X n )-P^ (n) /(X n _ re(n) )]|< Y. E [ fl (tr E (n) + i,t f£ (»))]' 

i=i 

Under A2(b), the RHS tends to zero as n — > +oo. Finally, the remaining 
term in (19) converges to zero, as a consequence of the a.s. convergence of 
{ n 6n(f)' n > 0} to a, and of the property lim n n — r £ (n) = +oo. □ 



4.2. Proof of Lemma 2.5. The proof of (i) follows by iterating the drift 
inequality in Saksman and Vihola (2010), Proposition 15. We now prove (ii). 
Saksman and Vihola [(2010), Proposition 15] implies that there exists a con- 
stant c such that on the set {sup fc<n _ 1 k~ T \9k\ < £}, 



sup „ 0k 

k<n-l 



Xe k < 1 - (ci^/V^ 2 )- 1 < 1 - (ct^/V^/ 2 )" 1 , 



"-a.s. 



Then by iterating the drift inequality in Saksman and Vihola [(2010), Propo- 
sition 15] this yields 

E[W(X n )t snPk ^_ ik - T]ek ^ t ] 

n-l 



<E[W(X )\ + 6^(1 - {ct d "il 2 n Td ^ 2 )- l ) k 

< E[W(X )] + b(ct d ^/ 2 n Td ^ 2 ). 

The last assertion follows from (10), (ii), and the Markov inequality: let e, 
r > 0; choose t e and r' > such that r — r'd^/2 > and P(sup n>1 |0 n |n _r > 
t e ) < e/2. Then 

snvn- l - T W{X n )>M 



<e/2 + \ 



su-pn- l - T W{X n ) > M,sup|6» n |n- r ' < t e 

■ n n>l 
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< e/2 + ±E [supn-^ WiXn)1 

lvl L n>l — 



<'r 



< e/2 + %- y -^-rf' d ^ 2 
-i M ^ n l+T 

ra>l 

for some constant C, and the RHS is upper bounded by e for large enough M. 

4.3. Proof of Theorem 2.7. The proof of Theorem 2.7 is prefaced by 
lemmas on the regularity in 9 of the invariant distribution irg and on the 
function Fg solution of the Poisson equation Fg — PgFg = F(-,9) — irg(F(-,9)). 

Under A3, Fg{x) d = Y,n P e{ F i^ e ) ~ M F (; e ))}( x ) exists for all x G X, 
solves the Poisson equation, and by Lemma 2.3 

(20) \Fg(x)\<\\F(.,9)\\ v L 2 V(x), 

where Lg is defined in (5). 

The following lemma is adapted from Andrieu et al. (2011). A detailed 
proof is given in Section 3 of the supplemental paper [Fort, Moulines and 
Priouret (2011)]. 



Lemma 4.2. Assume A3. For any 9 G 0, let Fg:X^- R + be a measur- 

le function such that suj 
irg(Fg)}. For any 9,9' G 9, 



def 

able function such that sup e \\Fg\\y < +oo and define Fg = ^2 n >oPg{F( 



he - ng, \\ v < L 2 g,{Trg(V) + L 2 gV(x)}D v (9, 9') 
and 

\PgFg - Pg,Fg,\ V < SU P \\Fg\\ V LJ,(LgD V (9,9') + ||7l* - 7T*||y) 

flee 

+ L 2 ,\\Fg-Fg,\\ V , 

where Lg is given by (5). 

Proof of Theorem 2.7. We denote by L the limit lim n f irg n (dx)F(9 n , 
x). We write ± £fc=o F(X k , 9 k ) -L = £* = i ?*,„ with 

Tin = — F(Xo,&o) , 

n n 

1 71—1 

T 2 , n = ~ yZ{F(X k , 9 k ) - F(X k ,9 k ^)}, 
k=\ 

n-\ 



1 ' t ( f 

T 3 , n = -J2\ F (Xk,0k-i) - / ng k _ 1 (dx)F(x,9 k -i] 
n fe=i ^ "' 

r 4 , n d = \y\\ ire k (dx)F(x,e k ) - L 
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Consider first T\^ n . Since \F{Xq,0q)\ < +oo P-a.s., \\u\ n ^ 00 T\^ n = Q P-a.s. 
Under conditions (ii) [resp., (iii)], T2 >n (resp., T^^) converges to zero a.s. 
(for T2ni note that Lg > 1 by definition). Consider finally T%^ n : 

\ ^2{F(Xk,0k-i) ~ J 7re k _ 1 (dx)F(x,e k _ 1 )\ = M n + R n + R n 
with Fg(x) = £„>o lf{n;0) - n 9 {F(.,6))}{x) and 

1 n— 1 

k=\ 

^ n—l 

fc=l 

^n = ^cA(*o) " ~P 0n _ A-x^n-l). 

By construction, {i 7 e fe _ 1 (Xfc) — Pg fe _ 1 i^e fe _ 1 (Xjfc_i), fc > 1} is a martingale- 
increment sequence. Therefore, by Hall and Heyde [(1980), Theorem 2.18], 
M n — '-$■ provided that 

(21) ^^E[|^_ 1 (X fe )-P, fc _ 1 ^_ 1 (X fe _ 1 )r|J fe _ 1 ]<+oo, P-a.s. 

fc>i 

Equation (20) and Jensen's inequality imply that (a > 1) 

EOiv^-p^A-^-ori-^-i] 

< 2 a - 1 E[\Fe k _ 1 (X k )\ a + \Po k _ 1 Fe k _ 1 (X k - 1 )\°'\? k - 1 ] 
<2 a (sup\\F(.,6)\\vLl_ i yPe k _ 1 V a (X k _ l ). 

Under item (i) and A5(b), the series is finite P-a.s. and this concludes the 
proof of (21). Consider now the remainder term R n . By Lemma 4.2, 

\rCn.\ S 



n 

n 
X 
fc=l 



Y J L 2 e k L 2 ek _ 1 {l + 7Te k {y) + Ll k }D v {6 k ,e k . 1 )V{X k 
fc=i 

1 n 

-£)Lgj|F(.A)-^0k-i)llvn*k)- 



n 
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Assumptions A4, A5(a) and items (i), (ii) imply that R n —4 0. Consider 
finally R n . By (20), 

1 



n 



\P6 F { X o) - Po n -\Fo n -\{Xn-l)\ 

< supdF y )iiv (Li{v(x 0) +beo}+ Jt^vw). 

Assumption A5(b), item (i) and the condition V(Xq) < +oo P-a.s. imply 
that R n ^ 0. D 

4.4. Proof of Theorem 2.11. We preface the proof of this theorem by 
a proposition and a lemma. The proof of Proposition 4.3 is postponed to 
Fort, Moulines and Priouret (2011), Section 4. 

Proposition 4.3. Let X be a Polish space endowed with its Borel a- 
field X. Let \x and {/x n ,n> 1} be probability distributions on (X,X). Let 
{h n ,n > 0} be an equicontinuous family of functions from X to R. Assume: 

(i) the sequence {fi n ,n>0} converges weakly to fj,, 
(ii) for any x € X, lim n _ i>00 h n (x) exists, and there exists a > 1 such that 
sup n /i n (|/i n | Q ) + /z(|lim n /i n |) <+oo. 

Then, fi n (h n ) -> /i(lim n /i n ). 

Lemma 4.4. Let X be a Polish space endowed with its Borel a-field X . 
Let {Pe,0 € @} be a family of transition kernels on (X,X) and {0 n ,n > 0} be 
a Q-valued random sequence on (0,.4.,P). Assume conditions (ii) and (Hi) 
of Theorem 2.11. Then, there exists a W-full set Q* such that for any uj € SI+, 
x £ X and k > 1, the probability distributions {Pq /^{x, -),n > 0} converge 

weakly to Pq (x, •). 



Proof. We prove, by induction on k, that there exists a P-full set f^ 

ok I 



such that for any u £ fij. and x € X, the probability distributions {Pq , ■> (x, •), 



n > 0} converge weakly to Pg (x, •). The proof is then concluded by setting 

Consider the case k = l. By condition (iii) of Theorem 2.11, for any x £ X 
there exists a P-full set fi x such that for any cj E Q x , {Pgr u \(x,-),n > 0} 
converges weakly to P^(x, •). Since X is Polish, it admits a countable dense 
subset V. Therefore, there exists a P-full set Q,x> such that for any u € fix? 
and any xGT>, {Pg„(u,)(x,-),n>Q} converges weakly to Pg^(x,-). Under 
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condition (ii) of Theorem 2.11, for any bounded continuous function /, the 

family of functions {Pef = Pef — Pe^/,0 G 0} is equicontinuous. For any 
e > and any x G X, there thus exists x e G T> such that for any G 0, 
\Pof(x) — Pgf(x e )\ < e. Hence, for any u G fix> and any bounded continuous 
function /, 

\Pe n (u)f(x)\ < \Pe n (u,)f(xe)\ + \P0 n ^)f(x) - P dn ( u )f(x £ )\ 
<\Pe nH f(x £ )-P e J(x £ )\+e. 

This implies that limsup„ \Pe n (w)f( x )\ < e - Since e was arbitrary, it follows 
{Pq( u \(x, -),n > 0} converges weakly to Pg ic (x,-) for any x. Hence, we set 

ill = I'D- 

Assume that the property holds for k > 1. We write for any bounded and 
continuous function / 



(22) 



*t&)f( x ) - P l +1 H*) = f(*twM) - Pl(x,dy))PeJ(y) 

+ j Pt {ul) (x,dy)(Pe n ^)f(y)-PeJ\y))- 



By the induction assumption, there exists a P-full set fife such that for any 
u) G fife, x G X and any bounded continuous function h, lim^-nx, Pg t u \h(x) = 

Pq h(x). Applied with h = Pe+f, which is continuous under the assump- 
tion (ii), this proves that for any u> G fife, the first term on the RHS of (22) 
goes to zero. For the second term, we use Proposition 4.3. Let u G fife n fii. 
For any x G X, {Pg , Jx,-),n > 0} converges weakly to Pg(x,-). Further- 
more, the family of bounded functions {Pe n {w)f ~ Pe*f^ n > 0} is equicon- 
tinuous and, since w G fii, Y\ra n ^ 00 P 9n ^- ) f{y) - PeJ{y) = for any j/GX. 
Proposition 4.3 thus implies that the second term on the RHS of (22) con- 
verges to zero, for any bounded continuous function /. The above discussion 
proves that fifc+i = fife PI fii = fii, and concludes the induction. □ 

Proof of Theorem 2.11. Fix x G X. Let / be a bounded continuous 
function on X. Under A3, we have by Lemma 2.3 

limsup|7T0 B (/) - P e k J(x) + P e k J(x) - 7veAf)\ 

n 

< (limsupCeJlimsuppejk + Cg it f$ i )v(x). 

By Lemma 2.3 and condition (i), limsup^C^ < +oo and limsup n/ 06i n < 1 
P-a.s.; then, there exists a P-full set fi" such that for any u> G fi", there 
exists k{ui) such that 

limsupK M (/) - Pg^fix) + Pg u) f(x) - 7reM)\ < e- 
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Note that £1" does not depend upon x and /. By Lemma 4.4, there exists 
a P-full set f2* such that limn^oo Pjf / u \f{x) = Pg f(x) for any u) G fi*, any 

cc £ X, any fc > 1 and any bounded continuous function /. The proof is 
concluded by setting £1+ = ^"0 0*. D 

5. Proofs of Section 3. 

5.1. Proof of Corollary 3.2. (i) By iterating the drift inequality (17), we 
obtain 

n-1 

E[W(X n )} < ~X n E[W(X )} + bY,^ k nOn-k(W)}. 

k=0 

Under 13(b), sup k > E[9 k (W)} < +oo so that 

(23) E[W(X n )] < \ n E[W(X )\ + -L TSap E[0 k (W)]. 

1 — A fc>o 

(ii) Since W is a continuous function, 13(a) implies that ]imsup n 6 n (W) < 
+oo, P-a.s. Consequently, limsup n Le n < +oo, P-a.s. by Lemma 2.3 and 
Proposition 3.1. 

5.2. Proof of Proposition 3.3. We check the conditions of Theorem 2.11. 
Condition (i) of Theorem 2.11 holds by Corollary 3.2. 

The proof of condition (ii) of Theorem 2.11 is a consequence of the fol- 
lowing lemma. 

Lemma 5.1. Let f be a function on X such that H/tt^Hoo < +co. For 
any i,i'gX such that tt(x) > 0, tt(x') > 0, 

sup \P f(x) - P e f(x')\ < \Pf(x) - Pf(x')\ + \f(x) - /Or')! 

+ 2||/7^|| 00 |7r-/ 3 (x)-7r-V)|. 

Proof. By definition of the transition kernel Pe, it is easily checked 
that 

Pef(x) - Pef(x') 

(24) = v j{a(x, y) - a(x' , y)}(f(y) - f(x'))9(dy) 

+ (1 - v)(Pf(x) - Pf(x')) + v(f(x) - f(x'))A(9,x), 

where A(6,x) = 1 — J a(x,y)9(dy). Since < a(x,y) < 1, we have 
\v(f(x)-f(x>))A(6,x)\<\f(x)-f(x')\. 
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We can assume w.l.o.g. that ir(x) < tt(x'). By definition of the ratio a, we 
have 

a(x,y) - a(x',y) = l{ 7r ( x )< 7r(?/ )< 7r ( :r .')}(vr~ /3 (y) - n~^ (x'))ir fi (y) 

showing that \a(x,y) -a(x' ,y)\ < (tt~ /3 (x) -vr- /3 (x'))vr /3 (y)l {7r(?/ )< 7r(:I .,)}. The 
proof is concluded by noting that 

\a(x,y)-a(x',y)\\f(y)-f(x')\9(dy) 
<2( S up\f\^)(n-P(x)-iT-P(x>)). D 

The most delicate part consists in establishing condition (hi) of Theo- 
rem 2.11. The proof relies on the following result which is an extension of 
the Varadarajan theorem [Dudley (2002), Theorem 11.4.1]. The proof of 
Proposition 5.2 is detailed in Section 5 of the supplemental paper [Fort, 
Moulines and Priouret (2011)]. 

Proposition 5.2. Let (U, d) be a metric space equipped with its Borel 
a -field i3(U). Let (Q,A,¥) be a probability space, [X be a distribution on 
(U, B(\J)) and {K n ,n > 0} be a family of Markov transition kernels K n : ft x 
B(V) -> [0, 1]. Assume that, for any f G C b (V,d) 

n f = 7 Lefl: limsup|^ n (w, /) - n(f) | = OJ 

is a W-full set. Then 

kO:V/£ C 6 (U, d) limsup \K n (oj, f) - fi(f) | = o} 

is a F-full set. 

Proof of (hi) of Theorem 2.11. We check the conditions of Proposi- 
tion 5.2 with \i n = Pg n (x, •) and \x = Pg+ (x, •). For any x € X, and / G Q(X), 
y I—)- a(x,y) and y i— > a(x,y)f(y) are continuous. Thus, 13(a) implies that 
PgJ{x) ^ PeJ(x) and Q f is a P-full set. D 

5.3. Proof of Theorem 3.6. Set ao = 1 and choose ai > 1 such that T x 
[L=o a i = T±- The proof is by induction on i for i = K down to i = 2. 

Set W^'- 1 ) d = 7 r- T *" 1 n£o 1 *« = TT-VC^ax) and jr^-i) be the probability 
distribution proportional to t^' K ~ 1 . Under the stated assumptions, Theo- 
rem 3.5 applies with Y <— X^ K > and X ^— X^ K ~ l >: for any continuous func- 

tion / in £ w(K ^ h n^ELi/^f _1) ) ^^ _1) (/)- ' 
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Assume Theorem 3.5 holds with Y •(— X^ +l > and X <— X^> for some i € 
{2, . . . , K — 1}: for any continuous function / in C W (*) , n _1 X^fc=i f(-^k) ~~ "^ 
n( l \f) where W^ = tt~ t * tll=o a i anc i n ( l ) otir 1 ^. We apply the above 
results with 

vr^vr 1 /^ 1 , e^TT 1 ^, P^P^" 1 ), 



Zi- 



1 



We thus have that n 1 X]fc=i/(^fc ) — ~* ^ (/) f° r an y continuous 
function / in £iy(*-i), where 

This concludes the induction. 

5.4. Proof of Proposition 3. 7. For any i € {1, . . . , if}, the transition ker- 
nels P( l > are 7r-ir reducible, aperiodic, and compact sets are 1-small. In addi- 
tion, they are Feller (the proof is on the same lines as the proof of Lemma 5.1). 
By Saksman and Vihola [(2010), Proposition 15] conditions (i) and (ii) of 
Theorem 3.6 are satisfied for i G {1, . . . , K}. Note that the proof of Proposi- 
tion 15 in Saksman and Vihola (2010) is in the case sTj = 1/2 but it can be 
easily adapted for any sTj G (0, 1). In the case i = K, this implies that there 
exist A G (0, 1) and b < +oo such that 

pWu<\U + b, 

where U = (it/ supx 7r )~ l ' T ■ Standard results on Markov chains [see, e.g., 
Meyn and Tweedie (2009)] imply (iv). By iterating the drift inequality, we 
have 

BupE[fr(xW)] < E\U{X<« ] )} + -^-, 

n L — A 

thus proving (v). Finally, since tt satisfies Ml, there exist positive con- 
stants Ci such that ir(x) < c\ exp(— C2 |x|) [see, e.g., Saksman and Vihola 
(2010), Lemma 8]. Therefore, for any r > 0, fir T (x)dx < +oo thus show- 
ing (hi). 
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SUPPLEMENTARY MATERIAL 

Supplement to paper "Convergence of adaptive and interacting Markov 
chain Monte Carlo algorithms" (DOI: 10.1214/11-AOS938SUPP; .pdf). This 
supplement provides a detailed proof of Lemma 4.2 and Propositions 3.1, 4.3 
and 5.2. It also contains a discussion on the setwise convergence of transition 
kernels. 
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